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AFFINELY INVARIANT MATCHING METHODS WITH 
DISCRIMINANT MIXTURES OF PROPORTIONAL 
ELLIPSOID ALLY SYMMETRIC DISTRIBUTIONS 

By Donald B. Rubin and Elizabeth A. Stuart 1 

Harvard University and Johns Hopkins University 

In observational studies designed to estimate the effects of in- 
terventions or exposures, such as cigarette smoking, it is desirable 
to try to control background differences between the treated group 
(e.g., current smokers) and the control group (e.g., never smokers) 
on covariates X (e.g., age, education). Matched sampling attempts 
to effect this control by selecting subsets of the treated and con- 
trol groups with similar distributions of such covariates. This pa- 
per examines the consequences of matching using affinely invariant 
methods when the covariate distributions are "discriminant mixtures 
of proportional ellipsoidally symmetric" (DMPES) distributions, a 
class herein defined, which generalizes the ellipsoidal symmetry class 
of Rubin and Thomas [Ann. Statist. 20 (1992) 1079-1093]. The re- 
sulting generalized results help indicate why earlier results hold quite 
well even when the simple assumption of ellipsoidal symmetry is not 
met [e.g., Biometrics 52 (1996) 249-264]. Extensions to conditionally 
affinely invariant matching with conditionally DMPES distributions 
are also discussed. 

1. Background. The goal in many applied projects is to estimate the 
causal effect of a treatment (e.g., cigarette smoking) from nonrandomized 
data by comparing outcomes (e.g., lung cancer rates) in treated (e.g., cur- 
rent smokers) and control (e.g., never smokers) groups, after adjusting for 
covariate differences (e.g., age, education) between the groups. A common 
method is to form matched subsamples of the treated and control groups 
such that the distributions of covariates X are more similar in the matched 
samples than in the original groups. The use of matched sampling has been 
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receiving more and more attention in fields such as statistics (e.g., [11, 15]), 
economics (e.g., [4, 7, 10, 21]), political science (e.g., [9]), sociology (e.g., 
[20]) and medicine (e.g., [1]) as a class of methods for controlling bias in 
such observational studies. Here we provide theoretical guidance for choos- 
ing matching methods that reduce bias in the matched groups, as well as 
guidance on the amount of bias reduction that can be achieved with fixed 
distributions and fixed sample sizes. 

We begin with random samples from the treated and control groups of 
fixed sizes Nt and N c , respectively, with X measured in both samples. 
Matching chooses subsamples of fixed sizes N m t and N mc from the origi- 
nal groups on which to measure the outcome variables, as well as possibly 
measure additional covariates. Throughout, we use the subscripts t and c 
to indicate quantities in the original random samples from the treated and 
control groups, and the subscripts mt and mc to indicate the corresponding 
quantities in the matched treated and control groups. 

We restrict attention to a particular but general class of matching meth- 
ods, those that are affinely invariant. In practice, many matching methods 
are affinely invariant in the sense that the same matched samples will be 
obtained after any full-rank affine transformation of X . For example, the 
same matches will be obtained if people's heights are measured in inches 
or centimeters, or if their temperatures are measured in degrees Fahren- 
heit or degrees Kelvin. Formally, let Xt and X c be data matrices (units by 
variables). A matching method is a mapping from (Xt,X c ) to a pair of sets 
of indices (T, C) representing the units chosen in the matched samples. An 
affinely invariant matching method results in the same output (T, C) after 
any (full-rank) affine transformation A of the X: 

(X t ,X c )^(T,C) implies (A(X t ), A{X C )) - (T, C). 

Affinely invariant matching methods include Mahalanobis metric, discrimi- 
nant or propensity score matching. Non-affinely invariant methods include 
methods where one coordinate of X is treated differently from the others or 
where nonlinear estimators of the discriminant (or other metric) are used, 
as discussed by Rubin and Thomas [16]. 

Theoretical results in papers by Rubin and Thomas [16, 17] describe the 
effects of affinely invariant matching on bias reduction, as well as on variance, 
in the matched treated and matched control groups, when X has ellipsoidally 
symmetric distributions (e.g., the normal distribution or the multivariate t) 
in the treated and control groups, with proportional covariances. Rubin and 
Thomas [18] used these theoretical results to obtain a series of approxima- 
tions for the bias and variance reduction possible in a particular matching 
setting using true and estimated propensity scores, with no subsampling of 
the treated sample and normal distributions. They then examined the per- 
formance of these approximations by simulation with ellipsoidal nonnormal 
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distributions and found that the approximations based on the normal dis- 
tribution held remarkably well, even for a i-distribution with 5 degrees of 
freedom. They also explored the performance of the approximations with 
real data from a study of prenatal hormone exposure, with 15 ordinal or 
dichotomous covariates. Again, the approximations based on the normal 
distribution were found to hold well, despite the clear deviations from the 
underlying assumptions. 

Later work by Hill, Rubin and Thomas [8] also showed that the Rubin 
and Thomas [18] approximations held quite well with real data in the con- 
text of an evaluation of the New York School Choice Scholarship Program, 
which utilized randomization to award scholarships to eligible participants. 
Out of the large pool of possible controls, a matched sample was chosen for 
follow-up, where the matching was done using an affinely invariant match- 
ing method based on 21 ordinal or dichotomous covariates. Hill, Rubin and 
Thomas compared the bias and variance benefits of choosing matched con- 
trols rather than a random sample of controls. The Rubin and Thomas [18] 
results predict a gain of efficiency for differences in covariate means by a 
factor of approximately two, and Hill et al. showed that this predicted gain 
in efficiency was achieved, despite the markedly nonnormal distributions of 
some of the covariates. 

In this paper we generalize the results of Rubin and Thomas [16, 17, 18] to 
the setting where the treated and control groups' covariate distributions are 
"discriminant mixtures of proportional ellipsoidally symmetric" (DMPES) 
distributions. We see that most, but not all, of the basic results in fact hold 
under these more general conditions, which support the broader applicability 
of these results, as suggested by the empirical evidence referenced above. We 
use as a running example the estimation of the effects of smoking on lung 
cancer, where the results here were used to motivate diagnostics for the 
results of matching [15]. 

2. Discriminant mixtures of ellipsoidally symmetric distributions. An 

ellipsoidal distribution for p-component A" is a distribution such that a linear 
transformation of A leads to a spherically symmetric distribution, which is 
defined by the distribution on the radii of concentric hyperspheres on which 
there is a uniform probability density. Thus, an ellipsoidal distribution is 
specified by its center, inner product and distribution on the radius [5]. 

Definition. The distribution on A, F(X), is a "discriminant mixture 
of proportional ellipsoidally symmetric" (DMPES) distribution if it possesses 
the following properties: 
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(i) F{X) is a mixture of K ellipsoidally symmetric distributions {F k ; k = 
1,...,K}, 

K 

(1) F(X) = y £a k F k (X), 

k=l 

where a k > for all k = 1, . . . , K, and J2f=i a k = 1> where F k has center fi k 
and inner product Y, k - Hence, the "mixture" (M) and "ellipsoidally symmet- 
ric" (ES) parts of DMPES. 

(ii) The K inner products are proportional: 

(2) Sj tx Tij for all i,j = l,...,K. 

Hence, the "proportional" (P) part of DMPES. 

(iii) The K centers are such that all best linear discriminants between 
any two components are proportional: 

(3) (m - /i;)^,: 1 oc (fii> - for &Hi,j,k,i',j',k' = 1,...,K. 

Hence, the "discriminant" (D) in DMPES, because all mixture component 
centers lie along the common best linear discriminant. 

In [16, 17, 18], K = 2, corresponding to the treated and control groups, 
and (2) is assumed; (3) is superfluous in the case with K = 2. 

With DMPES distributions, there exists an affine transformation to a 
special canonical form, which is a simple extension of results in [3, 6] and 
[14]. This canonical form has, for each mixture component, the property that 
the distribution of X is spherical, so that all inner products can be written 
as of-I, where I is the p x p identity matrix and o\ is a positive scalar 
constant, k = 1,...,K. Moreover, the canonical form has the component 
centers lying along the unit vector (unless all /Uj = fij) so that the centers 
are SkU, where U = (1, . . . , 1)', the p-component unit vector, and the 5 k are 
scalar constants, k = 1, . . . , K; if all /Xj = /ij, then all 5 k = 0. Therefore, in 
their canonical form, the distribution of each component of X is the same, 
and thus, the distribution of X is exchangeable, not only within each of the 
K mixture components, but also for any collection of mixture components 
defined by a subset of the indices {1, . . . , K}. 

Moreover, further symmetry results can be stated for a DMPES distribu- 
tion by decomposing X into its projection along the best linear discriminant, 
Z, and its projection orthogonal to Z. Specifically, the standardized best lin- 
ear discriminant can be written as 



(4) 



Z = U'X/p 1 l\ 
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unless all 5k = 0, in which case Z is defined to be 0, the zero vector. Also, let 
W be a standardized one-dimensional linear combination of X orthogonal 
to Z, 

(5) W = -f'X, j'Z = 0, 77 = 1. 

All such W have the identical distribution in each mixture component, and 
the identical distribution for any collection of mixture components defined 
by a subset of the indices {1, . . . , K}. Thus, the distribution of X orthogonal 
to Z has rotational symmetry, that is, is spherically symmetric. 

Now suppose Kt of the K mixture components comprise the treatment 
group, and K c components comprise the control group, Kt + K c = K; Kt, K c > 
1. Denote the set of treatment group component indices by T and the set 
of control group component indices by C, T UC = {1, . . . , K}. For example, 
T identifies current smokers and C identifies never smokers. The previous 
discussion implies that the distribution of X is exchangeable in the treated 
group and in the control group, and moreover, the distribution of X orthog- 
onal to the discriminant Z is spherically symmetric in the treated group and 
in the control group. This is the theoretical distributional setting for our re- 
sults. In the more restrictive setting of [16] with proportional ellipsoidally 
symmetric distributions, X is spherically symmetric in both groups. 

3. Results of matching with affinely invariant methods. When affinely 
invariant matching methods are used with DMPES distributions, the canon- 
ical form given in Section 2 can be assumed without loss of generality. The 
following results, stated in canonical form, closely parallel results from [16]. 
The main symmetry arguments do not change with the use of mixtures of 
distributions. Although most of our results can be written without assuming 
finite first two moments in each mixture component and without restricting 
K to be finite, the extra generality complicates notation and appears to be 
of little practical importance. 

Theorem 3.1. Suppose an affinely invariant matching method is applied 
to random treated and control samples with DMPES distributions. Then 

E(X mt ) oc E(X mc ) cx U 

and 

v&r(X mt - X mc ) cx I + cUU', c > -1/p, 

where X m t and X mc are the mean vectors in the matched treated and control 
samples, and E(-) and var(-) are the expectation and variance over repeated 
random draws from the initial treated and control populations. Also, 

E(u mt (X))xI + c t UU', ct>-l/p, 

E{u mc {X)) (x / + c c UU', c c > -l/p, 
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where v m t{X) and v mc (X) are the sample covariance matrices of X in the 
matched treated and control groups, respectively. Corresponding formulas 
also hold within each of the mixture components. When Z = 0, E(X m t) = 
E(X mc ) = 0, the zero vector, and c = a = c c = 0. 

Proof. The proof follows directly from symmetry arguments and is 
essentially the same as that of Theorem 3.1 in [16]. Briefly, with affinely 
invariant matching methods, the matching treats each coordinate of X the 
same and, hence, the exchangeability of the DMPES distributions of X in 
matched treated and control samples is not affected. Thus, the expectations 
of the matched sample means of all coordinates of X must be the same and, 
hence, the expectation of X must be proportional to U in each matched 
group. Analogously, the covariance matrices of X must be exchangeable 
in each matched group. The general form for the covariance matrix of ex- 
changeable variables is proportional to I + clIU' , c > —1/p. When Z = 0, 
the direction U is no different from any other, that is, there is complete 
rotational symmetry and, hence, the simplification. □ 

Corollary 3.1. The quantities v&r(W m t - W mc ), E(v mt (W)) and 
E{v m c(W)) take the same three values for all standardized W orthogonal 
to Z. In addition, for each mixture component, E{y m k{W)) takes the same 
value for all W , where u m k(W) is the sample variance ofW in the matched 
mixture component k G T or C. 

Proof. The corollary follows from the fact that, due to the rotational 
symmetry in matched samples implied by Theorem 3.1 orthogonal to the 
discriminant, any W will have the same distribution. □ 

4. The effects on a linear combination of X of affinely invariant matching 
relative to random sampling. As in [16, 17, 18, 19], it is natural to describe 
the results of matching by its effects on a linear combination of A, Y = (3'X, 
where, for convenience, we assume Y is standardized, f3'/3 = 1. Any such Y 
can be expressed as the sum of projections along and orthogonal to the best 
linear discriminant, 

(6) Y = pZ + {l-p 2 ) l ' 2 W 1 

where p is the correlation between Y and Z. When Z = 0,Y = W and p = 0. 

It is also natural, as in [16, 17, 18, 19], to compare the results of the match- 
ing to random sampling done in an affinely invariant way, such as randomly 
sampling from the original treated and control groups, thereby sampling 
from each component in proportion to its fraction in the population [the a's 
in (1)], or randomly sampling from each component with fixed proportions, 



MATCHING WITH DMPES DISTRIBUTIONS 



7 



where the same fixed proportions would be used in matching. We will refer 
to the treated and control samples generated by any such random sampling 
by indices rt and rc, respectively, where N rt = N mt and N rc = N mc , but 
generally, of course, Nt > N T t and N c > N rc . 

The following corollaries decompose the effects on Y of affinely invariant 
matching on X into the effects of the matching on Z and on W, relative 
to random sampling. Assuming the formulation from Section 2, we have the 
following results. 

COROLLARY 4.1. (a) When E(Z rt — Z rc ) / 0, the matching is equal 
percent bias reducing (EPBR), as defined by [14], 



Because the right-hand side of the above equation takes the same value for 
all Y , the percent bias reduction is the same for all Y . 

(b) When Z = 0, the numerator and denominator of both ratios in equa- 
tion (7) are 0. 

(c) When E(Z r t — Z rc ) = but Z ^ 0, the denominators of both ratios in 
equation (7) are 0, and then E(Y mt — Y mc ) = pE(Z mt — Z mc ). 

Proof. The proof of result (a) parallels the proof of Corollary 3.2 in 
[16]; however, here, rather than simple averages of Z, W and Y, the averages 
are weighted averages of the mixture components, weighted, for example, by 
the a's in (1). Using the definition of Y, 

E(Y mt - Y mc ) = P E(Z mt - Z mc ) + (Vl-p 2 )E(W mt - W mc ), 

where, by the definition of W, E{W mt — W mc ) = ^'E^Xmt — X mc ). From 
Theorem 3.1, E(X m t — X mc ) tx U and again from the definition of W in 
equation (5), j'Z = 0. Thus, 



Similarly, E(Y rt — Y rc ) = pE{Z rt — Z rc ) because E(W r t — W rc ) = and 
result (a) of Corollary 4.1 follows. 

Results (b) and (c) follow by analogous arguments. Situation (c) cannot 
arise when K = 2 because, with only one treated and one control compo- 
nent, E(Z r t — Z rc ) = implies that Z = 0. However, with multiple compo- 
nents in the treated and control groups, the difference in weighted averages 
(E(Z r t — Z rc )) can equal without all of the mixture component centers 



(7) 




E(Y m t — Y mc ) — pE(Z m t — Z mc ). 



({ Mfc }) being 0. □ 



This corollary implies that affinely invariant matching that reduces bias 
in one direction cannot create bias in some other direction. If bias reduction 
is obtained along Z, it is also obtained for all Y. 
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COROLLARY 4.2. The matching is p 2 proportionate modifying of the 
variance of the difference in matched sample means, 

var(Y mt - Y vac) 2^^\^mt ^mc 

) 2 var(W mt - W 

me I 

var(y rt - Y rc ) var(Z rt - Z rc ) var(W rt - W rc ) 

where the ratios 

vax(Z mt - Z ) v&r(W mt - W 

mc ) 

var(Z rt - Z rc ) \ax{W rt - W rc ) 

take the same two values for all Y . 

Proof. Using the definitions of Z and W in (4) and (5), 

cov(Z mt - Z mc , W mt - W mc ) = —U'var(X mt - X mc )j, 

VP 

which from Theorem 3.1 is proportional to 

U'(I + cUU')-f = U'j + cpU'j = 0, 

again using the definition of W in (5). Then, from the definition of Y in (6), 

vai(Y mt - Y mc ) = p 2 vav(Z mt - Z mc ) + (1 - p 2 ) \ai{W m t - W mc ). 

Equation (8) follows because, in random subsamples, the samples from each 
treated and control mixture component are independent with 

var(Y rt - Y rc ) = var(Z rt - Z rc ) = var(W rt - W rc ). 

Also, var(y r t) = var(Z r j) and var(y mc ) = var(Z mc ), and each is a weighted 
linear combination of the variances in each of the treated and control mixture 
components, respectively. The final statement of Corollary 4.2 follows from 
Corollary 3.1. □ 

Corollary 4.3. Within each of the mixture components, the matching 
is p 2 proportionate modifying of the expectation of the sample variances, 

E{v mk {Y)) _ 2 E{v mk {Z)) 2 E{u mk (W)) 

1 1 E(y rk (Y)) P E{u rk {Z)) ^ { ' > E(u rk (W)) ' 

where v r k{') is the sample variance of n k randomly chosen units from com- 
ponent k, and v m k(-) is the sample variance of n k matched units from com- 
ponent k (k £T or C), and the ratio 

E(v rk {W)) 

takes the same value for all Y . The same is true for E(v mk (Z)) / E{v rk {Z)) . 
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Proof. In the matched sample from component k G T or C, the ex- 
pected covariance of Z and W is 

E(cov mk (Z, W)) = ^-E(U'u mk (X) 7 ) oc U'(I + c fe p£/*7')7 = 0, 

from Theorem 3.1 and the definition of W in (5), and u mk (X) oc / + c k UU', 
where the constants c k > —1/p- Then, from (6), 

E{v mk {Y)) = p 2 E(u mk (Z)) + (1 - p 2 )E{v mk {W)). 

Equation (9) follows because E{v rk (Y)) = E(v rk (Z)) = E(v rk (W)). The fi- 
nal statement follows from Corollary 3.1. □ 

Note that the version of Corollary 4.3 stated for the full treated and 
control groups does not hold. In the special case considered in [16], there is 
only one component in each group. 

5. Conditionally affinely invariant matching with conditionally DMPES 
distributions. We now extend the results of the previous sections to a set- 
ting where a subset of the covariates is treated differently from the remain- 
der of the covariates, for example, exact matching on gender followed by 
discriminant matching, or Mahalanobis matching on key covariates within 
propensity score calipers [13]. Such matching was done, for example, in [15] 
when creating matched samples of current smokers and never smokers. 

We define to be the s "special covariates" spanning an s-dimensional 
subspace (e.g., gender, race in the smoking example) and X^ to be the 
r =p — s remaining covariates spanning an r-dimensional subspace (e.g., 
education, age). The methods considered are "conditionally affinely invariant 
matching methods" [16], which have the property that the result of the 
matching is the same following any (full-rank) affine transformation of the 
"remainder" covariates X^: 

implies 

((X t (s \A(X t (r) )),(X^ s \A(X^)))^(T,C). 
In parallel with Section 2, we consider the case where each mixture compo- 

(s) (r) 

nent of the full covariate distribution has mean vectors n k and fj, k , covari- 
ance matrices Ejjf^ and si , and conditional means and covariance matrices 

given by n k r ^ and ■ The full distribution of X = (X^ r \ X^) across 
both groups is a conditional DMPES distribution if (i) the conditional dis- 
tribution X^\X^ is ellipsoidal in each mixture component, (ii) it has pro- 
portional conditional covariance matrices, x[, r '^ oc £j^' s ^ for all k and k', and 
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(iii) it has centers such that (/4 — (J,j 
for all i, j, k, i' , j', k' = 1, . . . , K. Notice that condition (ii) implies a common 
(across all mixture components) linear regression of the r covariates in 
on the s covariates in X^ s \ with coefficients B. As noted by [16], the special 
case with X" binomial or multinomial and multivariate normal relates 
to the logistic regression model for predicting treated or control status given 
the covariates, thus relating it to the methods of propensity score estimation 
developed by Rosenbaum and Rubin [12, 13]. 

We again can use a canonical form when a conditionally affinely invariant 
matching method is used with a conditionally DMPES distribution. The 
covariates X^ r > are redefined as the components of X^ uncorrelated with 
X^: X^ — B'X^ S \ The following notation is then used for the moments of 
the distribution of X^ [and the conditional moments of X^ given X^ s ']: 



.) )E (rW- 



t (A 







(i 



{r\s)^(, r \ 



(r) 

He 



5 ( M 



4i, 



k = 1, . . . , K, where and af. are scalar constants, U is now the r-dimensional 
unit vector, and I is now the r x r identity matrix. Thus, the distributions 
of (X^^X^) and I< r ) given X^ s > are exchangeable under permutations of 
components of conditional on X^ 8 ' in each of the mixture components. 



Theorem 5.1. Suppose a conditionally affinely invariant matching method 
is applied to random treated and control samples with conditional DMPES 
distributions. Then, in canonical form, 



E(X^ t )^U, 



E(X% C ) oc U 



and 



vai(X mt - X 

mc) 



var(X^) - X^ c ) 

uc 



CU' 
k(I + c UU') 



where k>0, Co > — 1/r and C = (ci, . . . ,c s ). Also, 

~E(v mt {X^)) C t U' 



E{v mt {X)) 



UC' t k t (I + c t0 UU') 



where kt > 0, c±q > —^/r, C t = (cti,ct2, ■ ■ ■ ,cts)> with an analogous result and 
notation for the matched control group. When Z = 0, E{X mt ) = E(X mc ) = 
0, C = Ct = C c = 0, the zero vector, and cq = cto = c c o = 0. 



Proof. The proof of this theorem parallels that of Theorem 3.1, with 
the exception of the existence of the covariances between components in X^ 
and X^ r \ Due to the symmetry, these covariances are also exchangeable in 
the coordinates of X^ r \ □ 
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6. Effect on Y of matching with special covariates. In parallel with the 
earlier formulation, we express an arbitrary linear combination of X as 

Y = pZ + (l-p 2 ) 1/2 W, 

where Z and W are the standardized projections of Y along and orthogonal 
to the subspace spanned by {X^ S \Z), respectively, and p is the correlation 
between Y and Z. In this framework, Z is the standardized discriminant of 
the covariates uncorrelated with X^ s \ again expressed in canonical form as 
Z = U' ' X^ /r 1 / 2 . When pLu = for all k, Z is defined to be the zero vector, 
and then Z is defined to be the projection of Y in the subspace spanned by 

We write Z and W as 

(10) Z = tfX = ty&\^ T) ')(^w), 

(11) w= 1 , x=( 1 ^,^')^ r ]y 

Lemma 6.1. The coefficients 7 and ip satisfy 

(12) 7 W = (0,...,0)', 7 Wy r ) = Z 7 W = and ^ oc U. 

Proof. Equation (12) follows because W is a linear combination of 
X uncorrelated with {X^ S \Z}, and thus uncorrelated with {X^}, and 
because Z is uncorrelated with W. The other results follow from these and 
the definition of Z in canonical form. □ 

Because the symmetry results of Theorem 5.1 for X orthogonal to Z imply 
that all W orthogonal to Z have the same distribution, we immediately have 
the following corollary to Theorem 5.1. 

Corollary 6.1. The quantities var(W mf - W mc ), E(u mt (W)), and 
E{vmc(W)) take the same three values for all standardized Y . Analogous 
results hold for statistics in random subsamples indexed by rt and rc. In ad- 
dition, E(y m k(W)) takes the same value for allW within each of the mixture 
components, k £T or C. However, the corresponding expressions involving 
Z generally do depend on the choice ofY . 

Corollary 6.2. (a) When E(Z rt — Z rc ) ^ 0, the percent bias reduction 
in Y equals the percent bias reduction ofY in the subspace {X^ S \Z}, 

E{Y m t Y mc ) E{Z m i — Z mc ) 

E(Y rt — Y rc ) E(Z rt — Z TC ) 
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(b) When E(Z r t — Z rc ) = 0, the denominators of both ratios in (a) equal 
0, and E(Y m t - Y mc ) = pE(Z mt - Z mc ). 

Proof. The proof parallels that of Corollary 4.1 because W = -y'^X^ 
from the definition of W in (11) and Lemma 6.1, and from Theorem 5.1 and 

Lemma 6.1, j'^E(X^ t - X^ c ) = 0. Thus, E(W mt - W mc ) = 0. □ 

Corollary 6.3. The matching is p 2 proportionate modifying of the 
variance of the difference in matched sample means, 

v&r(Y mt - Y mc ) = ^ 2 v&i(Z mt - Z mc ) + _ ^ 2 var(W m t -W mc ) 
var(Y ri - Y rc ) v&i(Z rt - Z rc ) var(W ri - W rc ) 

where the ratio var(W m t — W' fnc )/var(VV ; r t — W rc ) takes the same value for 
all Y. 

Proof. The proof is analogous to that of Corollary 4.2 using Theo- 
rem 5.1 and Lemma 6.1, and parallels the proof of Corollary 4.3 in [16], 
where, in that proof, there is a typographical error: Z mt — Z mc and W m t — 
W mc should be replaced by Z mt — Z mc and Wmt — Wmc, respectively. □ 

Corollary 6.4. Within each mixture component, the matching is p 2 
proportionate modifying of the expectation of the sample variances, 

E(u mk (Y)) _ 2 E(u mk (Z)) 2 E(u mk (W)) 

E{u rk {Y)) P E(u rk (Z)) +[ P) E(v rk (W)) 

for all k G T or C, where the ratio E(v mk (W)) / E(v rk (W)) takes the same 
value for all Y within each mixture component. 

Proof. The proof of this corollary parallels that of Corollary 4.3, with 
modifications similar to those in the proof of Corollary 6.3. Again, as in 
Corollary 4.2, this result generally holds only in each of the individual treated 
and control group components, and the analogous result in the overall sam- 
ples does not hold. □ 

7. Discussion. Here we have shown that most of the results proven by 
Rubin and Thomas [16] can be extended to discriminant mixtures of pro- 
portional ellipsoidally symmetric (DMPES) distributions, as defined in Sec- 
tion 2, and provides some theoretical rationale for why the earlier Rubin 
and Thomas [16, 17, 18] results hold well even when the assumption of el- 
lipsoidally symmetric distributions is not met. These results show that even 
with the more complicated setting of DMPES distributions, the effects of 
matching on an arbitrary linear combination of the covariates can be sum- 
marized by its effects along and orthogonal to the discriminant. 



MATCHING WITH DMPES DISTRIBUTIONS 



13 



Although the class of DMPES distributions is still restrictive, previous ex- 
perience has indicated that mathematically convenient conditions for match- 
ing can provide guidance in real- world examples. A classic example is in [2] 
on the bias reduction possible from stratified matching. Although Cochran's 
results were proved assuming infinite samples sizes and a linear relationship 
between a single covariate and the outcome, the approximations and their 
implied guidance have found applicability and use for a much wider range of 
situations. For a specific example here, the implications of our results were 
the basis for the applied diagnostics in [15] used to assess the quality of 
the matched samples of smokers and never smokers in the National Medical 
Expenditure Survey, based on decomposing the comparisons of the distribu- 
tions in the matched samples into components along and orthogonal to the 
discriminant. 
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